Boost Sales, Today!

Robin Bista

05/20/2021

Introduction

Todays fast-changing retail industry expects retailers to know their customers shopping behavior beforehand. Sales optimization requires accurate prediction of customers shopping habits and fulfillment of the inventories in advance. Negligence to comply impedes customers shopping experience and diminishes the customer base. Today, let us explore the grocery dataset in the R to create an association between the 20 frequently sold items and complementary products bought with those items. We will find and use the correlation to suggest that retailers place such products in adjacent aisles for smooth shopping experiences and sales optimization.

1. Apriori Algorithm

Apriori algorithm, as the name suggests, uses prior information of frequent itemset properties to find relations between items. It applies either an iterative approach or level-wise search to find k+1 itemsets from k-frequent itemsets. The algorithm uses the Apriori property to improve the efficiency of level-wise generation of frequent itemsets, by reducing the search spaces. Apriori property states all non-empty subsets of a frequent itemset must be frequent. Let us take an example to further understand the algorithm.

The Cake dataset below consists of a few imaginary items purchased from a retail store:

The Association Rules:

The dataset helps us construct a set of rules as follows:

Rule 1: If Flour is purchased, then Egg is also purchased.

Rule 2: If Egg is purchased, then Flour is also purchased.

Rule 3: If Flour and Eggs are purchased, then Sugarr is also purchased in 60% of the transactions.

Above rules, explicitly state:

1. Whenever Flour is purchased, Egg is also purchased or vice versa.
2.If Flour and Egg are purchased then the Sugar is also purchased. This is true in 3 out of the 5   transactions.

If {Flour} and {Sugar} both are one-item sets, a new set {Flour, Sugar} can be created with the information . The new set is used to identify the products purchased when both flour and sugar are purchased. Let us look at a suitable Right Hand Side (RHS) and Left Hand Side (LHS) for multiple items of a single transaction to form an association between item sets. Every purchase of {Sugar} with { Flour}, is represented as {Sugar} => {Flour}. Here {Sugar} and {Flour} is RHS and LHS respectively. This association can be used to find other k-items and k+1 itemset. Transactions including {Sugar, Flour} have high chaces of including Baking Soda.

In this method, the Apriori algorithm uses k-itemsets to search (k+1) itemsets. The first 1-item set iterates to find 2-item sets until (k+1) item set.

1.1 Dataset Handling

Let us analyze the “Groceries” data in R where the retailers store the transaction in a specific dataset called “Transaction”.

## Loading required package: arules
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write

1.2 Overview

## Formal class 'transactions' [package "arules"] with 3 slots
##   ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
##   .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
##   .. .. ..@ Dim     : int [1:2] 169 9835
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ factors : list()
##   ..@ itemInfo   :'data.frame':  169 obs. of  3 variables:
##   .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
##   .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
##   .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
##   ..@ itemsetInfo:'data.frame':  0 obs. of  0 variables

The transaction dataset is internally divided into 3 different slots: data, itemInfo, and itemsetInfo. The data class contains multiple headers like dimensions, dimension names, and the number of products purchased in each transaction.

##               labels     level2               level1
## 1        frankfurter    sausage     meat and sausage
## 2            sausage    sausage     meat and sausage
## 3         liver loaf    sausage     meat and sausage
## 4                ham    sausage     meat and sausage
## 5               meat    sausage     meat and sausage
## 6  finished products    sausage     meat and sausage
## 7    organic sausage    sausage     meat and sausage
## 8            chicken    poultry     meat and sausage
## 9             turkey    poultry     meat and sausage
## 10              pork       pork     meat and sausage
## 11              beef       beef     meat and sausage
## 12    hamburger meat       beef     meat and sausage
## 13              fish       fish     meat and sausage
## 14      citrus fruit      fruit fruit and vegetables
## 15    tropical fruit      fruit fruit and vegetables
## 16         pip fruit      fruit fruit and vegetables
## 17            grapes      fruit fruit and vegetables
## 18           berries      fruit fruit and vegetables
## 19       nuts/prunes      fruit fruit and vegetables
## 20   root vegetables vegetables fruit and vegetables

The first 20 rows in the “itemInfo” class provides the name ofitems under the column “labels”. The “level1” generalizes the items and “level2” catetorizes it into specific domain, which helps in efficient correlations.

2. Implementing Apriori Algorithm

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

The minimum support parameter (minSup) is set to .001. Minimum confidence (minConf) can take value between 0.75 and 0.85 for varied results. Further explanation about support, lift, and Confidence is given below:

Support:

Support can be understood as the general probability of a particular event occurring. For example, lets assume an event named ‘Buy’, which represents buying a product. In this case, the support of ‘Buy’ is the number of transactions including ‘Buy’ divided by total number of transactions in the store.

Confidence:

The confidence of an event is the conditional probability of the occurrence of any event after one particular event has occurred. In general terms, it is more like chances of A happening given that B has already occurred.

Lift:

The ratio of confidence to expected confidence is the lift. The probability of all of the items in a rule occurring, divided by the product of the probabilities of the items on the left and right side occurring is lift. The lift value represents the quality of rule to predict associate between items. Higher the lift, stronger the association.

The top 20 rules arranged by lift:

##      lhs                         rhs                    support confidence    coverage      lift count
## [1]  {liquor,                                                                                         
##       red/blush wine}         => {bottled beer}     0.001931876  0.9047619 0.002135231 11.235269    19
## [2]  {curd,                                                                                           
##       cereals}                => {whole milk}       0.001016777  0.9090909 0.001118454  3.557863    10
## [3]  {yogurt,                                                                                         
##       cereals}                => {whole milk}       0.001728521  0.8095238 0.002135231  3.168192    17
## [4]  {butter,                                                                                         
##       jam}                    => {whole milk}       0.001016777  0.8333333 0.001220132  3.261374    10
## [5]  {soups,                                                                                          
##       bottled beer}           => {whole milk}       0.001118454  0.9166667 0.001220132  3.587512    11
## [6]  {napkins,                                                                                        
##       house keeping products} => {whole milk}       0.001321810  0.8125000 0.001626843  3.179840    13
## [7]  {whipped/sour cream,                                                                             
##       house keeping products} => {whole milk}       0.001220132  0.9230769 0.001321810  3.612599    12
## [8]  {pastry,                                                                                         
##       sweet spreads}          => {whole milk}       0.001016777  0.9090909 0.001118454  3.557863    10
## [9]  {turkey,                                                                                         
##       curd}                   => {other vegetables} 0.001220132  0.8000000 0.001525165  4.134524    12
## [10] {rice,                                                                                           
##       sugar}                  => {whole milk}       0.001220132  1.0000000 0.001220132  3.913649    12
## [11] {butter,                                                                                         
##       rice}                   => {whole milk}       0.001525165  0.8333333 0.001830198  3.261374    15
## [12] {domestic eggs,                                                                                  
##       rice}                   => {whole milk}       0.001118454  0.8461538 0.001321810  3.311549    11
## [13] {rice,                                                                                           
##       bottled water}          => {whole milk}       0.001220132  0.9230769 0.001321810  3.612599    12
## [14] {yogurt,                                                                                         
##       rice}                   => {other vegetables} 0.001931876  0.8260870 0.002338587  4.269346    19
## [15] {oil,                                                                                            
##       mustard}                => {whole milk}       0.001220132  0.8571429 0.001423488  3.354556    12
## [16] {canned fish,                                                                                    
##       hygiene articles}       => {whole milk}       0.001118454  1.0000000 0.001118454  3.913649    11
## [17] {herbs,                                                                                          
##       fruit/vegetable juice}  => {other vegetables} 0.001220132  0.8000000 0.001525165  4.134524    12
## [18] {herbs,                                                                                          
##       shopping bags}          => {other vegetables} 0.001931876  0.8260870 0.002338587  4.269346    19
## [19] {tropical fruit,                                                                                 
##       herbs}                  => {whole milk}       0.002338587  0.8214286 0.002846975  3.214783    23
## [20] {herbs,                                                                                          
##       rolls/buns}             => {whole milk}       0.002440264  0.8000000 0.003050330  3.130919    24

Top 20 rules produced from Groceries data is given below. First rule states when Liquior and Red Wine is bought, it is likely bottled beer is also bought.

3. Interpretations and Analysis

3.1 The Item Frequency Histogram

Histogram below represents the frequency of an item occurred in the dataset as compared to other items. The relative frequency plot shows “Whole Milk” and “Other Vegetables” are among the tow two most purchased products.

## [1] 5.1 4.1 4.1 2.1

The graph above represents people buy milk and vegetable relatively more compared to other items in the store. Now, let us placed relatable items near milk and vegetables to optimize sales. Bread and eggs can be a great complement.

3.2 Graphical Representation

The graph below represents support and lifts of multiple items in the inventory and show association among those items. The size of the nodes is based on support levels and the color is based on lift ratios.

## Warning: Unknown control parameters: type
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

It is clear that most of the transactions are around Whole Milk. Liquor and wine also show strong associated. Similarly, tropical fruits and herbs have relations with rolls and buns. A bit off but, its okay! These items should be placed in the same asile.

Each black box represents a non-zero value which means a correlation between items and the transactions.

3.3 Interactive Scatterplot

The interactive plot visualizes association rules and plots a scattered plot. The x-axis and the y-axis represent support and confidence respectively. Let’s move around the scattered plot and see the results.

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Moving around the plot displays lift, support, and confidence for the set of items. Item set like, {Liquor, Red wine} => {Bottle beer} has a confidence of 0.95 and high lift of 11.2, it is a suitable set of items to place together.

4. Conclusion

After visualizing above plots, a more detailed and effective strategy can be implemented to place related items together. The Grocery dataset transaction has a strong correlation between “Whole Milk” with “Vegetables” and “Wine” with “Bottled Beer”. Some specific aisles allows customers to have a smooth and pleasant shopping experience with the ease of acces to related items.It acts as a catalyst to boost the store sales simultaneously.

Aisles Proposed:

Liquor Aisle – Liquor, Red/Blush Wine, Bottled Beer
Groceries Aisle – Other vegetables, Whole milk, Oil, Yogurt, Rice, Root Vegetable
Fruit Aisle – Citrus Fruit, Grape, Fruit/Vegetable juice, Tropical fruits
Breakfast Aisle – Pastry, Curd, Cereals, Sweet Spreads

Now you know the tricks retailers adopt for your convenient shopping experience and to boost their sales!

Keep shopping, enjoy shopping!!